18 research outputs found

    Distribution-based measures of tumor heterogeneity are sensitive to mutation calling and lack strong clinical predictive power.

    Get PDF
    Mutant allele frequency distributions in cancer samples have been used to estimate intratumoral heterogeneity and its implications for patient survival. However, mutation calls are sensitive to the calling algorithm. It remains unknown whether the relationship of heterogeneity and clinical outcome is robust to these variations. To resolve this question, we studied the robustness of allele frequency distributions to the mutation callers MuTect, SomaticSniper, and VarScan in 4722 cancer samples from The Cancer Genome Atlas. We observed discrepancies among the results, particularly a pronounced difference between allele frequency distributions called by VarScan and SomaticSniper. Survival analysis showed little robust predictive power for heterogeneity as measured by Mutant-Allele Tumor Heterogeneity (MATH) score, with the exception of uterine corpus endometrial carcinoma. However, we found that variations in mutant allele frequencies were mediated by variations in copy number. Our results indicate that the clinical predictions associated with MATH score are primarily caused by copy number aberrations that alter mutant allele frequencies. Finally, we present a mathematical model of linear tumor evolution demonstrating why MATH score is insufficient for distinguishing different scenarios of tumor growth. Our findings elucidate the importance of allele frequency distributions as a measure for tumor heterogeneity and their prognostic role

    Pan-cancer classifications of tumor histological images using deep learning

    Get PDF
    Histopathological images are essential for the diagnosis of cancer type and selection of optimal treatment. However, the current clinical process of manual inspection of images is time consuming and prone to intra- and inter-observer variability. Here we show that key aspects of cancer image analysis can be performed by deep convolutional neural networks (CNNs) across a wide spectrum of cancer types. In particular, we implement CNN architectures based on Google Inception v3 transfer learning to analyze 27815 H&E slides from 23 cohorts in The Cancer Genome Atlas in studies of tumor/normal status, cancer subtype, and mutation status. For 19 solid cancer types we are able to classify tumor/normal status of whole slide images with extremely high AUCs (0.995±0.008). We are also able to classify cancer subtypes within 10 tissue types with AUC values well above random expectations (micro-average 0.87±0.1). We then perform a cross-classification analysis of tumor/normal status across tumor types. We find that classifiers trained on one type are often effective in distinguishing tumor from normal in other cancer types, with the relationships among classifiers matching known cancer tissue relationships. For the more challenging problem of mutational status, we are able to classify TP53 mutations in three cancer types with AUCs from 0.65-0.80 using a fully-trained CNN, and with similar cross-classification accuracy across tissues. These studies demonstrate the power of CNNs for not only classifying histopathological images in diverse cancer types, but also for revealing shared biology between tumors. We have made software available at: https://github.com/javadnoorb/HistCNNFirst author draf

    Mutations in DNA repair genes are associated with increased neo-antigen load and activated T cell infiltration in lung adenocarcinoma.

    Get PDF
    Mutations in DNA repair genes lead to increased genomic instability and mutation frequency. These mutations represent potential biomarkers for cancer immunotherapy efficacy, as high tumor mutational burden has been associated with increased neo-antigens and tumor infiltrating lymphocytes. While mismatch repair mutations have successfully predicted response to anti-PD-1 therapy in colorectal and other cancers, they have not yet been tested for lung cancer, and few have investigated genes from other DNA repair pathways. We utilized TCGA samples to comprehensively immunophenotype lung tumors and analyze the links between DNA repair mutations, neo-antigen and total mutational burden, and tumor immune infiltration. Overall, 73% of lung tumors contained infiltration by at least one T cell subset, with high mutational burden tumors containing significantly increased infiltration by activated CD4 and CD8 T cells. Further, mutations in mismatch repair genes, homologous recombination genes, or POLE accurately predicted increased tumor mutational burden, neo-antigen load, and T cell infiltration. Finally, neo-antigen load correlated with expression of M1-polarized macrophage genes, PD-1, PD-L1, IFNγ, GZMB, and FASLG, among other immune-related genes. Overall, after defining the immune infiltrate in lung tumors, we demonstrate the potential value of utilizing gene mutations from multiple DNA repair pathways as biomarkers for lung cancer immunotherapy. Oncotarget 2018; 9(8):7949-796

    Fostering bioinformatics education through skill development of professors: Big Genomic Data Skills Training for Professors.

    Get PDF
    Bioinformatics has become an indispensable part of life science over the past 2 decades. However, bioinformatics education is not well integrated at the undergraduate level, especially in liberal arts colleges and regional universities in the United States. One significant obstacle pointed out by the Network for Integrating Bioinformatics into Life Sciences Education is the lack of faculty in the bioinformatics area. Most current life science professors did not acquire bioinformatics analysis skills during their own training. Consequently, a great number of undergraduate and graduate students do not get the chance to learn bioinformatics or computational biology skills within a structured curriculum during their education. To address this gap, we developed a module-based, week-long short course to train small college and regional university professors with essential bioinformatics skills. The bioinformatics modules were built to be adapted by the professor-trainees afterward and used in their own classes. All the course materials can be accessed at https://github.com/TheJacksonLaboratory/JAXBD2K-ShortCourse

    Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images.

    Get PDF
    Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors

    Mutations in DNA repair genes are associated with increased neoantigen burden and a distinct immunophenotype in lung squamous cell carcinoma.

    Get PDF
    Deficiencies in DNA repair pathways, including mismatch repair (MMR), have been linked to higher tumor mutation burden and improved response to immune checkpoint inhibitors. However, the significance of MMR mutations in lung cancer has not been well characterized, and the relevance of other processes, including homologous recombination (HR) and polymerase epsilon (POLE) activity, remains unclear. Here, we analyzed a dataset of lung squamous cell carcinoma samples from The Cancer Genome Atlas. Variants in DNA repair genes were associated with increased tumor mutation and neoantigen burden, which in turn were linked with greater tumor infiltration by activated T cells. The subset of tumors with DNA repair gene variants but without T cell infiltration exhibited upregulation of TGF-β and Wnt pathway genes, and a combined score incorporating these genes and DNA repair status accurately predicted immune cell infiltration. Finally, high neoantigen burden was positively associated with genes related to cytolytic activity and immune checkpoints. These findings provide evidence that DNA repair pathway defects and immunomodulatory genes together lead to specific immunophenotypes in lung squamous cell carcinoma and could potentially serve as biomarkers for immunotherapy

    Deep learning-based cross-classifications reveal conserved spatial behaviors within tumor histological images

    Get PDF
    Histopathological images are a rich but incompletely explored data type for studying cancer. Manual inspection is time consuming, making it challenging to use for image data mining. Here we show that convolutional neural networks (CNNs) can be systematically applied across cancer types, enabling comparisons to reveal shared spatial behaviors. We develop CNN architectures to analyze 27,815 hematoxylin and eosin scanned images from The Cancer Genome Atlas for tumor/normal, cancer subtype, and mutation classification. Our CNNs are able to classify TCGA pathologist-annotated tumor/normal status of whole slide images (WSIs) in 19 cancer types with consistently high AUCs (0.995 ± 0.008), as well as subtypes with lower but significant accuracy (AUC 0.87 ± 0.1). Remarkably, tumor/normal CNNs trained on one tissue are effective in others (AUC 0.88 ± 0.11), with classifier relationships also recapitulating known adenocarcinoma, carcinoma, and developmental biology. Moreover, classifier comparisons reveal intra-slide spatial similarities, with an average tile-level correlation of 0.45 ± 0.16 between classifier pairs. Breast cancers, bladder cancers, and uterine cancers have spatial patterns that are particularly easy to detect, suggesting these cancers can be canonical types for image analysis. Patterns for TP53 mutations can also be detected, with WSI self- and cross-tissue AUCs ranging from 0.65-0.80. Finally, we comparatively evaluate CNNs on 170 breast and colon cancer images with pathologist-annotated nuclei, finding that both cellular and intercellular regions contribute to CNN accuracy. These results demonstrate the power of CNNs not only for histopathological classification, but also for cross-comparisons to reveal conserved spatial behaviors across tumors.R01 CA230031 - NCI NIH HHSPublished versio

    Integrating 5-Hydroxymethylcytosine into the Epigenomic Landscape of Human Embryonic Stem Cells

    Get PDF
    Covalent modification of DNA distinguishes cellular identities and is crucial for regulating the pluripotency and differentiation of embryonic stem (ES) cells. The recent demonstration that 5-methylcytosine (5-mC) may be further modified to 5-hydroxymethylcytosine (5-hmC) in ES cells has revealed a novel regulatory paradigm to modulate the epigenetic landscape of pluripotency. To understand the role of 5-hmC in the epigenomic landscape of pluripotent cells, here we profile the genome-wide 5-hmC distribution and correlate it with the genomic profiles of 11 diverse histone modifications and six transcription factors in human ES cells. By integrating genomic 5-hmC signals with maps of histone enrichment, we link particular pluripotency-associated chromatin contexts with 5-hmC. Intriguingly, through additional correlations with defined chromatin signatures at promoter and enhancer subtypes, we show distinct enrichment of 5-hmC at enhancers marked with H3K4me1 and H3K27ac. These results suggest potential role(s) for 5-hmC in the regulation of specific promoters and enhancers. In addition, our results provide a detailed epigenomic map of 5-hmC from which to pursue future functional studies on the diverse regulatory roles associated with 5-hmC

    Clinical Characteristics, Racial Inequities, and Outcomes in Patients with Breast Cancer and COVID-19: A COVID-19 and Cancer Consortium (CCC19) Cohort Study

    Get PDF
    BACKGROUND: Limited information is available for patients with breast cancer (BC) and coronavirus disease 2019 (COVID-19), especially among underrepresented racial/ethnic populations. METHODS: This is a COVID-19 and Cancer Consortium (CCC19) registry-based retrospective cohort study of females with active or history of BC and laboratory-confirmed severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) infection diagnosed between March 2020 and June 2021 in the US. Primary outcome was COVID-19 severity measured on a five-level ordinal scale, including none of the following complications, hospitalization, intensive care unit admission, mechanical ventilation, and all-cause mortality. Multivariable ordinal logistic regression model identified characteristics associated with COVID-19 severity. RESULTS: 1383 female patient records with BC and COVID-19 were included in the analysis, the median age was 61 years, and median follow-up was 90 days. Multivariable analysis revealed higher odds of COVID-19 severity for older age (aOR per decade, 1.48 [95% CI, 1.32-1.67]); Black patients (aOR 1.74; 95 CI 1.24-2.45), Asian Americans and Pacific Islander patients (aOR 3.40; 95 CI 1.70-6.79) and Other (aOR 2.97; 95 CI 1.71-5.17) racial/ethnic groups; worse ECOG performance status (ECOG PS ≥2: aOR, 7.78 [95% CI, 4.83-12.5]); pre-existing cardiovascular (aOR, 2.26 [95% CI, 1.63-3.15])/pulmonary comorbidities (aOR, 1.65 [95% CI, 1.20-2.29]); diabetes mellitus (aOR, 2.25 [95% CI, 1.66-3.04]); and active and progressing cancer (aOR, 12.5 [95% CI, 6.89-22.6]). Hispanic ethnicity, timing, and type of anti-cancer therapy modalities were not significantly associated with worse COVID-19 outcomes. The total all-cause mortality and hospitalization rate for the entire cohort was 9% and 37%, respectively however, it varied according to the BC disease status. CONCLUSIONS: Using one of the largest registries on cancer and COVID-19, we identified patient and BC-related factors associated with worse COVID-19 outcomes. After adjusting for baseline characteristics, underrepresented racial/ethnic patients experienced worse outcomes compared to non-Hispanic White patients. FUNDING: This study was partly supported by National Cancer Institute grant number P30 CA068485 to Tianyi Sun, Sanjay Mishra, Benjamin French, Jeremy L Warner; P30-CA046592 to Christopher R Friese; P30 CA023100 for Rana R McKay; P30-CA054174 for Pankil K Shah and Dimpy P Shah; KL2 TR002646 for Pankil Shah and the American Cancer Society and Hope Foundation for Cancer Research (MRSG-16-152-01-CCE) and P30-CA054174 for Dimpy P Shah. REDCap is developed and supported by Vanderbilt Institute for Clinical and Translational Research grant support (UL1 TR000445 from NCATS/NIH). The funding sources had no role in the writing of the manuscript or the decision to submit it for publication. CLINICAL TRIAL NUMBER: CCC19 registry is registered on ClinicalTrials.gov, NCT04354701

    CloudNeo: A cloud pipeline for identifying patient-specific tumor neoantigens.

    No full text
    Availability: The CWL implementation is at: https://github.com/TheJacksonLaboratory/CloudNeo . For users who have obtained licenses for all internal software, integrated versions in CWL and on the Seven Bridges Cancer Genomics Cloud platform ( htps://cgc.sbgenomics.com/ , recommended version) can be obtained by contacting the authors. Bioinformatics 2017 Oct 1; 33(19):3110-3112. Contact: [email protected]
    corecore